{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Boosting\n",
    "## AdaBoost\n",
    "\n",
    "参考:\n",
    "[傻子都能看懂的——详解AdaBoost原理](https://blog.csdn.net/qq_38890412/article/details/120360354)\n",
    "[浅谈机器学习的梯度提升算法](https://github.com/apachecn/ml-mastery-zh/blob/7a3e2c2c60acdd90a85413a5af5d1241e09a6294/docs/xgboost/gentle-introduction-gradient-boosting-algorithm-machine-learning.md)"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 什么是加性模型(additive model)?\n",
    "\n",
    "$$\n",
    "H(\\boldsymbol{x})=\\sum_{t=1}^T \\alpha_t h_t(\\boldsymbol{x})\n",
    "$$"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## AdaBoost\n",
    "机器学习的本质是对根据数据给出特定假定，构建模型，将其作为一个数值优化问题。\n",
    "AdaBoost这类模型也类似, 通过梯度下降添加弱学习器来最小化模型损失。\n",
    "\n",
    "AdaBoost思想起源于:\n",
    "Hypothesis Boosting Problem: an efficient algorithm for converting relatively poor hypotheses into very good hypotheses.\n",
    "![](images/2021091809193281.png)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 假定一个加性模型\n",
    "$$\n",
    "H(\\boldsymbol{x})=\\sum_{t=1}^T \\alpha_t h_t(\\boldsymbol{x})\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\ell_{\\exp }(h \\mid \\mathcal{D})=\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) h(\\boldsymbol{x})}\\right]\n",
    "$$\n",
    "\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "下面考虑迭代过程中的最优化问题, 分别对$h_{t}(x)$和$\\alpha_{t}$求骗到可得到最优化的两个推论:\n",
    "- 推论1\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\ell_{\\exp }\\left(H_{t-1}+h_t \\mid \\mathcal{D}\\right) & \\approx \\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\left(1-f(\\boldsymbol{x}) h_t(\\boldsymbol{x})+\\frac{f(\\boldsymbol{x})^2 h_t(\\boldsymbol{x})^2}{2}\\right)\\right] \\\\\n",
    "&=\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\left(1-f(\\boldsymbol{x}) h_t(\\boldsymbol{x})+\\frac{1}{2}\\right)\\right], \\quad(2.10)\n",
    "\\end{aligned}\n",
    "$$\n",
    "by noticing that $f(\\boldsymbol{x})^2=1$ and $h_t(\\boldsymbol{x})^2=1$.\n",
    "Thus, the ideal classifier $h_t$ is\n",
    "$$\n",
    "\\begin{aligned}\n",
    "h_t(\\boldsymbol{x}) &=\\underset{h}{\\arg \\min } \\ell_{\\exp }\\left(H_{t-1}+h \\mid \\mathcal{D}\\right) \\\\\n",
    "&=\\underset{h}{\\arg \\min } \\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\left(1-f(\\boldsymbol{x}) h(\\boldsymbol{x})+\\frac{1}{2}\\right)\\right] \\\\\n",
    "&=\\underset{h}{\\arg \\max } \\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})} f(\\boldsymbol{x}) h(\\boldsymbol{x})\\right] \\\\\n",
    "&=\\underset{h}{\\arg \\max } \\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[\\frac{e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}}{\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\right]} f(\\boldsymbol{x}) h(\\boldsymbol{x})\\right]\n",
    "\\end{aligned}\n",
    "$$\n",
    "by noticing that $\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\right]$ is a constant.\n",
    "Denote a distribution $\\mathcal{D}_t$ as\n",
    "$$\n",
    "\\mathcal{D}_t(x)=\\frac{\\mathcal{D}(x) e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}}{\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\right]} .\n",
    "$$\n",
    "Then, by the definition of mathematical expectation, it is equivalent to write that\n",
    "$$\n",
    "\\begin{aligned}\n",
    "h_t(\\boldsymbol{x}) &=\\underset{h}{\\arg \\max } \\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[\\frac{e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}}{\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\right]} f(\\boldsymbol{x}) h(\\boldsymbol{x})\\right] \\\\\n",
    "&=\\underset{h}{\\arg \\max } \\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}_t}[f(\\boldsymbol{x}) h(\\boldsymbol{x})] .\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\mathcal{D}_{t+1}(\\boldsymbol{x}) &=\\frac{\\mathcal{D}(x) e^{-f(\\boldsymbol{x}) H_t(\\boldsymbol{x})}}{\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_t(\\boldsymbol{x})}\\right]} \\\\\n",
    "&=\\frac{\\mathcal{D}(\\boldsymbol{x}) e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})} e^{-f(\\boldsymbol{x}) \\alpha_t h_t(\\boldsymbol{x})}}{\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_t(\\boldsymbol{x})}\\right]} \\\\\n",
    "&=\\mathcal{D}_t(\\boldsymbol{x}) \\cdot e^{-f(\\boldsymbol{x}) \\alpha_t h_t(\\boldsymbol{x})} \\frac{\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_{t-1}(\\boldsymbol{x})}\\right]}{\\mathbb{E}_{\\boldsymbol{x} \\sim \\mathcal{D}}\\left[e^{-f(\\boldsymbol{x}) H_t(\\boldsymbol{x})}\\right]},\n",
    "\\end{aligned}\n",
    "$$\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "- 推论2\n",
    "$$\n",
    "\\frac{\\partial \\ell_{\\exp }\\left(\\alpha_t h_t \\mid \\mathcal{D}_t\\right)}{\\partial \\alpha_t}=-e^{-\\alpha_t}\\left(1-\\epsilon_t\\right)+e^{\\alpha_t} \\epsilon_t=0,\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\frac{\\partial \\ell_{\\exp }\\left(\\alpha_t h_t \\mid \\mathcal{D}_t\\right)}{\\partial \\alpha_t}=-e^{-\\alpha_t}\\left(1-\\epsilon_t\\right)+e^{\\alpha_t} \\epsilon_t=0,\n",
    "$$\n",
    "then the solution is\n",
    "$$\n",
    "\\alpha_t=\\frac{1}{2} \\ln \\left(\\frac{1-\\epsilon_t}{\\epsilon_t}\\right)\n",
    "$$\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "因此Adaboost算法迭代过程为:\n",
    "1. sample data: $\\{ (x_1, y_1), \\ldots, (x_i, y_i)) \\}, y_i \\in  \\ \\{1, \\ldots, K\\}$, Initialize the observation weights $w_i=1 / n, i=1,2, \\ldots, n$.\n",
    "2. For $m=1$ to $M$ :\n",
    "(a) Fit a classifier $T^{(m)}(\\boldsymbol{x})$ to the training data using weights $w_i$.\n",
    "(b) Compute\n",
    "$\\operatorname{err}^{(m)}=\\sum_{i=1}^n w_i I\\left(y_i \\neq T^{(m)}\\left(\\boldsymbol{x}_i\\right)\\right) / \\sum_{i=1}^n w_i $ .\n",
    "(c) Compute\n",
    "$\\alpha^{(m)}=\\log \\frac{1-e r r^{(m)}}{e r r^{(m)}}$ .\n",
    "(d) Set\n",
    "$w_i \\leftarrow w_i \\cdot \\exp \\left(\\alpha^{(m)} \\cdot I\\left(y_i \\neq T^{(m)}\\left(\\boldsymbol{x}_i\\right)\\right)\\right), i=1,2, \\ldots, n$ .\n",
    "(e) Re-normalize $w_i$.\n",
    "3. Output\n",
    "$$\n",
    "C(\\boldsymbol{x})=\\arg \\max _k \\sum_{m=1}^M \\alpha^{(m)} \\cdot I\\left(T^{(m)}(\\boldsymbol{x})=k\\right) .\n",
    "$$\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "from sklearn.ensemble import AdaBoostClassifier\n",
    "\n",
    "# A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting\n",
    "# https://www.sciencedirect.com/science/article/pii/S002200009791504X\n",
    "# https://hastie.su.domains/Papers/samme.pdf\n",
    "AdaBoostClassifier().predict()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "https://projecteuclid.org/journals/annals-of-statistics/volume-29/issue-5/Greedy-function-approximation-A-gradient-boosting-machine/10.1214/aos/1013203451.full   GBDT\n",
    "\n",
    "https://arxiv.org/abs/1603.02754  xgboost\n",
    "\n",
    "\n",
    "其他:\n",
    "https://zhuanlan.zhihu.com/p/142115015"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "import pandas as pd\n",
    "import pymysql\n",
    "\n",
    "with pymysql.connect(\n",
    "        user='dbabigd',\n",
    "        password='dbabigddbabigd',\n",
    "        host='175.24.125.182',\n",
    "        port=9030,\n",
    "        database='crisps_order_center_v2'\n",
    ") as conn:\n",
    "    data = pd.read_sql_query(\n",
    "        sql=\"\"\"select a.cus_order_no as 客户单编号,cast(a.customer_id as char)as 客户id,a.customer_name,b.classify_three_name,a.create_time,a.cus_order_status_no\n",
    "from crisps_order_center_v2.order_cus a\n",
    "left join crisps_order_center_v2.order_sku b on a.cus_order_no = b.cus_order_no\n",
    "where b.classify_three_name not like '%测试%' and a.customer_name not like '%测试%' and\n",
    "a.cus_order_status_no not like 'ORDER_CUS_STATUS_CANCELLED'\n",
    "order by a.create_time asc\"\"\",\n",
    "        con=conn)\n",
    "    data.head()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [
    {
     "data": {
      "text/plain": "               客户单编号                 客户id   customer_name classify_three_name  \\\n0       C21122500002  8162886898785517568     138******16              有限公司注册   \n1       C21122500010  8091319090696617984     135******70               个人抵押贷   \n2       C21122500019  7957543119100125184     182******74              有限公司注册   \n3       C21122500046  8096575238874005504     155******46                法律咨询   \n4       C21122500049  7822989022025662464             夏茂源                个体变更   \n5       C21122500051  8091319090696617984     135******70               公积金代理   \n6       C21122500090  8091319090696617984     135******70                个体变更   \n7       C21122500091  7957543119100125184     182******74              有限公司注册   \n8       C21122500097  8068071036795289600     183******50                个体变更   \n9       C21122500171  7822989022025662464             夏茂源                个体变更   \n10      C21122500184  8068071036795289600     183******50                个体变更   \n11      C21122500195  8085416071761035264        研发中心技术小弟                个体变更   \n12      C21122500205  8091319090696617984     135******70              企业社保代理   \n13      C21122500217  8116578530924232704     158******90               车辆质押贷   \n14      C21122500222  8103926078664146944     150******50                 发票贷   \n15      C21122500237  8076076052575485952     133******63                个体变更   \n16      C21122500258  7957543119100125184     182******74              有限公司注册   \n17      C21122500296  7822989022025662464             夏茂源                个体变更   \n18      C21122500298  8091319090696617984     135******70               公积金代理   \n19      C21122500313  8116578530924232704     158******90                 发票贷   \n20      C21122500322  7822989022025662464             夏茂源                个体变更   \n21      C21122500345  7822989022025662464             夏茂源                个体变更   \n22      C21122500355  8121752592688414720     153******96              企业社保代理   \n23      C21122500358  8115866167673683968     151******85                 发票贷   \n24      C21122500384  8115866167673683968     151******85                 发票贷   \n25      C21122500391  8098483402775003136              毛豆               车辆质押贷   \n26      C21122500391  8098483402775003136              毛豆                 发票贷   \n27      C21122500392  8109043473196056576     158******02                个体变更   \n28      C21122500401  8091319090696617984     135******70              一般记账新签   \n29      C21122500426  8076020906847961088              杨帆                 发票贷   \n...              ...                  ...             ...                 ...   \n141311  C22091900256  2372293132733609715     137******92              有限公司注册   \n141312  C22091900257  2372514512527912859     135******76        互联网资质附属_人员处理   \n141313  C22091900257  2372514512527912859     135******76            营业性演出许可证   \n141314  C22091900258  2372351200445838893     171******53                商标注册   \n141315  C22091900259                75316    成都真趣文化传播有限公司                报表编制   \n141316  C22091900259                75316    成都真趣文化传播有限公司             小规模记账续费   \n141317  C22091900260  8089338750135500800    广东耀彩建设工程有限公司                专利年费   \n141318  C22091900261  1362078169271247585     195******77                债权债务   \n141319  C22091900263  2371982074267953919     188******88                商标注册   \n141320  C22091900266  2372438336987947546     186******00              有限公司注册   \n141321  C22091900267                51396             邱泽亮             小规模记账续费   \n141322  C22091900271  8065398200285724672  北京华清御都商务服务有限公司                任职变更   \n141323  C22091900271  8065398200285724672  北京华清御都商务服务有限公司              经营范围变更   \n141324  C22091900276  8149161135330623488     176******39              变更套餐服务   \n141325  C22091900278  8246617696214253568     130******66             个体工商户注册   \n141326  C22091900280  7693027011266490368             张先生                换刻印章   \n141327  C22091900282             16164666   成都鑫胜达房产经纪有限公司             小规模记账续费   \n141328  C22091900283  7992674694086336512              姜杨                公积金贷   \n141329  C22091900283  7992674694086336512              姜杨                 经营贷   \n141330  C22091900285               138694  成都水木年华文化传播有限公司             小规模记账续费   \n141331  C22091900286             16776161   四川展毅达文化传播有限公司             小规模记账续费   \n141332  C22091900286             16776161   四川展毅达文化传播有限公司                年报申报   \n141333  C22091900286             16776161   四川展毅达文化传播有限公司                报表编制   \n141334  C22091900288  8119151582353096704             徐志高                年报申报   \n141335  C22091900288  8119151582353096704             徐志高                报表编制   \n141336  C22091900288  8119151582353096704             徐志高             小规模记账续费   \n141337  C22091900289  8065398200285724672  北京华清御都商务服务有限公司                公司注销   \n141338  C22091900290               117339   四川宝利来投资管理有限公司              有限公司注册   \n141339  C22091900290               117339   四川宝利来投资管理有限公司             小规模记账新签   \n141340  C22091900290               117339   四川宝利来投资管理有限公司                报表编制   \n\n               create_time           cus_order_status_no  \n0      2021-12-25 10:08:20  ORDER_CUS_STATUS_PROGRESSING  \n1      2021-12-25 11:00:23  ORDER_CUS_STATUS_PROGRESSING  \n2      2021-12-25 11:07:23  ORDER_CUS_STATUS_PROGRESSING  \n3      2021-12-25 11:56:28  ORDER_CUS_STATUS_PROGRESSING  \n4      2021-12-25 13:23:24    ORDER_CUS_STATUS_COMPLETED  \n5      2021-12-25 13:28:53  ORDER_CUS_STATUS_PROGRESSING  \n6      2021-12-25 14:20:20  ORDER_CUS_STATUS_PROGRESSING  \n7      2021-12-25 14:20:26  ORDER_CUS_STATUS_PROGRESSING  \n8      2021-12-25 14:44:15  ORDER_CUS_STATUS_PROGRESSING  \n9      2021-12-25 15:38:26  ORDER_CUS_STATUS_PROGRESSING  \n10     2021-12-25 15:43:59  ORDER_CUS_STATUS_PROGRESSING  \n11     2021-12-25 15:47:57  ORDER_CUS_STATUS_PROGRESSING  \n12     2021-12-25 15:55:06  ORDER_CUS_STATUS_PROGRESSING  \n13     2021-12-25 15:59:44  ORDER_CUS_STATUS_PROGRESSING  \n14     2021-12-25 16:01:34  ORDER_CUS_STATUS_PROGRESSING  \n15     2021-12-25 16:13:18  ORDER_CUS_STATUS_PROGRESSING  \n16     2021-12-25 16:26:10  ORDER_CUS_STATUS_PROGRESSING  \n17     2021-12-25 16:43:21  ORDER_CUS_STATUS_PROGRESSING  \n18     2021-12-25 16:44:15  ORDER_CUS_STATUS_PROGRESSING  \n19     2021-12-25 16:53:45  ORDER_CUS_STATUS_PROGRESSING  \n20     2021-12-25 17:00:22  ORDER_CUS_STATUS_PROGRESSING  \n21     2021-12-25 17:12:32  ORDER_CUS_STATUS_PROGRESSING  \n22     2021-12-25 17:20:05  ORDER_CUS_STATUS_PROGRESSING  \n23     2021-12-25 17:25:55  ORDER_CUS_STATUS_PROGRESSING  \n24     2021-12-25 18:13:23  ORDER_CUS_STATUS_PROGRESSING  \n25     2021-12-25 18:25:53  ORDER_CUS_STATUS_PROGRESSING  \n26     2021-12-25 18:25:53  ORDER_CUS_STATUS_PROGRESSING  \n27     2021-12-25 18:29:42  ORDER_CUS_STATUS_PROGRESSING  \n28     2021-12-25 18:41:00  ORDER_CUS_STATUS_PROGRESSING  \n29     2021-12-25 19:14:04  ORDER_CUS_STATUS_PROGRESSING  \n...                    ...                           ...  \n141311 2022-09-19 11:03:49  ORDER_CUS_STATUS_PROGRESSING  \n141312 2022-09-19 11:04:11       ORDER_CUS_STATUS_UNPAID  \n141313 2022-09-19 11:04:11       ORDER_CUS_STATUS_UNPAID  \n141314 2022-09-19 11:05:25    ORDER_CUS_STATUS_UNSUBMITE  \n141315 2022-09-19 11:05:34    ORDER_CUS_STATUS_UNSUBMITE  \n141316 2022-09-19 11:05:34    ORDER_CUS_STATUS_UNSUBMITE  \n141317 2022-09-19 11:05:38  ORDER_CUS_STATUS_PROGRESSING  \n141318 2022-09-19 11:05:45       ORDER_CUS_STATUS_UNPAID  \n141319 2022-09-19 11:07:14    ORDER_CUS_STATUS_UNSUBMITE  \n141320 2022-09-19 11:08:48    ORDER_CUS_STATUS_UNSUBMITE  \n141321 2022-09-19 11:08:49    ORDER_CUS_STATUS_UNSUBMITE  \n141322 2022-09-19 11:11:00    ORDER_CUS_STATUS_UNSUBMITE  \n141323 2022-09-19 11:11:00    ORDER_CUS_STATUS_UNSUBMITE  \n141324 2022-09-19 11:12:53       ORDER_CUS_STATUS_UNPAID  \n141325 2022-09-19 11:14:11  ORDER_CUS_STATUS_PROGRESSING  \n141326 2022-09-19 11:14:51  ORDER_CUS_STATUS_PROGRESSING  \n141327 2022-09-19 11:14:55    ORDER_CUS_STATUS_UNSUBMITE  \n141328 2022-09-19 11:15:10  ORDER_CUS_STATUS_PROGRESSING  \n141329 2022-09-19 11:15:10  ORDER_CUS_STATUS_PROGRESSING  \n141330 2022-09-19 11:16:44  ORDER_CUS_STATUS_PROGRESSING  \n141331 2022-09-19 11:17:37       ORDER_CUS_STATUS_UNPAID  \n141332 2022-09-19 11:17:37       ORDER_CUS_STATUS_UNPAID  \n141333 2022-09-19 11:17:37       ORDER_CUS_STATUS_UNPAID  \n141334 2022-09-19 11:18:08    ORDER_CUS_STATUS_UNSUBMITE  \n141335 2022-09-19 11:18:08    ORDER_CUS_STATUS_UNSUBMITE  \n141336 2022-09-19 11:18:08    ORDER_CUS_STATUS_UNSUBMITE  \n141337 2022-09-19 11:18:55    ORDER_CUS_STATUS_UNSUBMITE  \n141338 2022-09-19 11:19:03    ORDER_CUS_STATUS_UNSUBMITE  \n141339 2022-09-19 11:19:03    ORDER_CUS_STATUS_UNSUBMITE  \n141340 2022-09-19 11:19:03    ORDER_CUS_STATUS_UNSUBMITE  \n\n[141341 rows x 6 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>客户单编号</th>\n      <th>客户id</th>\n      <th>customer_name</th>\n      <th>classify_three_name</th>\n      <th>create_time</th>\n      <th>cus_order_status_no</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>C21122500002</td>\n      <td>8162886898785517568</td>\n      <td>138******16</td>\n      <td>有限公司注册</td>\n      <td>2021-12-25 10:08:20</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>C21122500010</td>\n      <td>8091319090696617984</td>\n      <td>135******70</td>\n      <td>个人抵押贷</td>\n      <td>2021-12-25 11:00:23</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>C21122500019</td>\n      <td>7957543119100125184</td>\n      <td>182******74</td>\n      <td>有限公司注册</td>\n      <td>2021-12-25 11:07:23</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>C21122500046</td>\n      <td>8096575238874005504</td>\n      <td>155******46</td>\n      <td>法律咨询</td>\n      <td>2021-12-25 11:56:28</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>C21122500049</td>\n      <td>7822989022025662464</td>\n      <td>夏茂源</td>\n      <td>个体变更</td>\n      <td>2021-12-25 13:23:24</td>\n      <td>ORDER_CUS_STATUS_COMPLETED</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>C21122500051</td>\n      <td>8091319090696617984</td>\n      <td>135******70</td>\n      <td>公积金代理</td>\n      <td>2021-12-25 13:28:53</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>C21122500090</td>\n      <td>8091319090696617984</td>\n      <td>135******70</td>\n      <td>个体变更</td>\n      <td>2021-12-25 14:20:20</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>C21122500091</td>\n      <td>7957543119100125184</td>\n      <td>182******74</td>\n      <td>有限公司注册</td>\n      <td>2021-12-25 14:20:26</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>C21122500097</td>\n      <td>8068071036795289600</td>\n      <td>183******50</td>\n      <td>个体变更</td>\n      <td>2021-12-25 14:44:15</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>C21122500171</td>\n      <td>7822989022025662464</td>\n      <td>夏茂源</td>\n      <td>个体变更</td>\n      <td>2021-12-25 15:38:26</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>C21122500184</td>\n      <td>8068071036795289600</td>\n      <td>183******50</td>\n      <td>个体变更</td>\n      <td>2021-12-25 15:43:59</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>C21122500195</td>\n      <td>8085416071761035264</td>\n      <td>研发中心技术小弟</td>\n      <td>个体变更</td>\n      <td>2021-12-25 15:47:57</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>12</th>\n      <td>C21122500205</td>\n      <td>8091319090696617984</td>\n      <td>135******70</td>\n      <td>企业社保代理</td>\n      <td>2021-12-25 15:55:06</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>13</th>\n      <td>C21122500217</td>\n      <td>8116578530924232704</td>\n      <td>158******90</td>\n      <td>车辆质押贷</td>\n      <td>2021-12-25 15:59:44</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>14</th>\n      <td>C21122500222</td>\n      <td>8103926078664146944</td>\n      <td>150******50</td>\n      <td>发票贷</td>\n      <td>2021-12-25 16:01:34</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>15</th>\n      <td>C21122500237</td>\n      <td>8076076052575485952</td>\n      <td>133******63</td>\n      <td>个体变更</td>\n      <td>2021-12-25 16:13:18</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>16</th>\n      <td>C21122500258</td>\n      <td>7957543119100125184</td>\n      <td>182******74</td>\n      <td>有限公司注册</td>\n      <td>2021-12-25 16:26:10</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>17</th>\n      <td>C21122500296</td>\n      <td>7822989022025662464</td>\n      <td>夏茂源</td>\n      <td>个体变更</td>\n      <td>2021-12-25 16:43:21</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>18</th>\n      <td>C21122500298</td>\n      <td>8091319090696617984</td>\n      <td>135******70</td>\n      <td>公积金代理</td>\n      <td>2021-12-25 16:44:15</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>19</th>\n      <td>C21122500313</td>\n      <td>8116578530924232704</td>\n      <td>158******90</td>\n      <td>发票贷</td>\n      <td>2021-12-25 16:53:45</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>20</th>\n      <td>C21122500322</td>\n      <td>7822989022025662464</td>\n      <td>夏茂源</td>\n      <td>个体变更</td>\n      <td>2021-12-25 17:00:22</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>21</th>\n      <td>C21122500345</td>\n      <td>7822989022025662464</td>\n      <td>夏茂源</td>\n      <td>个体变更</td>\n      <td>2021-12-25 17:12:32</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>22</th>\n      <td>C21122500355</td>\n      <td>8121752592688414720</td>\n      <td>153******96</td>\n      <td>企业社保代理</td>\n      <td>2021-12-25 17:20:05</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>23</th>\n      <td>C21122500358</td>\n      <td>8115866167673683968</td>\n      <td>151******85</td>\n      <td>发票贷</td>\n      <td>2021-12-25 17:25:55</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>24</th>\n      <td>C21122500384</td>\n      <td>8115866167673683968</td>\n      <td>151******85</td>\n      <td>发票贷</td>\n      <td>2021-12-25 18:13:23</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>25</th>\n      <td>C21122500391</td>\n      <td>8098483402775003136</td>\n      <td>毛豆</td>\n      <td>车辆质押贷</td>\n      <td>2021-12-25 18:25:53</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>26</th>\n      <td>C21122500391</td>\n      <td>8098483402775003136</td>\n      <td>毛豆</td>\n      <td>发票贷</td>\n      <td>2021-12-25 18:25:53</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>27</th>\n      <td>C21122500392</td>\n      <td>8109043473196056576</td>\n      <td>158******02</td>\n      <td>个体变更</td>\n      <td>2021-12-25 18:29:42</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>28</th>\n      <td>C21122500401</td>\n      <td>8091319090696617984</td>\n      <td>135******70</td>\n      <td>一般记账新签</td>\n      <td>2021-12-25 18:41:00</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>29</th>\n      <td>C21122500426</td>\n      <td>8076020906847961088</td>\n      <td>杨帆</td>\n      <td>发票贷</td>\n      <td>2021-12-25 19:14:04</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>141311</th>\n      <td>C22091900256</td>\n      <td>2372293132733609715</td>\n      <td>137******92</td>\n      <td>有限公司注册</td>\n      <td>2022-09-19 11:03:49</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>141312</th>\n      <td>C22091900257</td>\n      <td>2372514512527912859</td>\n      <td>135******76</td>\n      <td>互联网资质附属_人员处理</td>\n      <td>2022-09-19 11:04:11</td>\n      <td>ORDER_CUS_STATUS_UNPAID</td>\n    </tr>\n    <tr>\n      <th>141313</th>\n      <td>C22091900257</td>\n      <td>2372514512527912859</td>\n      <td>135******76</td>\n      <td>营业性演出许可证</td>\n      <td>2022-09-19 11:04:11</td>\n      <td>ORDER_CUS_STATUS_UNPAID</td>\n    </tr>\n    <tr>\n      <th>141314</th>\n      <td>C22091900258</td>\n      <td>2372351200445838893</td>\n      <td>171******53</td>\n      <td>商标注册</td>\n      <td>2022-09-19 11:05:25</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141315</th>\n      <td>C22091900259</td>\n      <td>75316</td>\n      <td>成都真趣文化传播有限公司</td>\n      <td>报表编制</td>\n      <td>2022-09-19 11:05:34</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141316</th>\n      <td>C22091900259</td>\n      <td>75316</td>\n      <td>成都真趣文化传播有限公司</td>\n      <td>小规模记账续费</td>\n      <td>2022-09-19 11:05:34</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141317</th>\n      <td>C22091900260</td>\n      <td>8089338750135500800</td>\n      <td>广东耀彩建设工程有限公司</td>\n      <td>专利年费</td>\n      <td>2022-09-19 11:05:38</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>141318</th>\n      <td>C22091900261</td>\n      <td>1362078169271247585</td>\n      <td>195******77</td>\n      <td>债权债务</td>\n      <td>2022-09-19 11:05:45</td>\n      <td>ORDER_CUS_STATUS_UNPAID</td>\n    </tr>\n    <tr>\n      <th>141319</th>\n      <td>C22091900263</td>\n      <td>2371982074267953919</td>\n      <td>188******88</td>\n      <td>商标注册</td>\n      <td>2022-09-19 11:07:14</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141320</th>\n      <td>C22091900266</td>\n      <td>2372438336987947546</td>\n      <td>186******00</td>\n      <td>有限公司注册</td>\n      <td>2022-09-19 11:08:48</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141321</th>\n      <td>C22091900267</td>\n      <td>51396</td>\n      <td>邱泽亮</td>\n      <td>小规模记账续费</td>\n      <td>2022-09-19 11:08:49</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141322</th>\n      <td>C22091900271</td>\n      <td>8065398200285724672</td>\n      <td>北京华清御都商务服务有限公司</td>\n      <td>任职变更</td>\n      <td>2022-09-19 11:11:00</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141323</th>\n      <td>C22091900271</td>\n      <td>8065398200285724672</td>\n      <td>北京华清御都商务服务有限公司</td>\n      <td>经营范围变更</td>\n      <td>2022-09-19 11:11:00</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141324</th>\n      <td>C22091900276</td>\n      <td>8149161135330623488</td>\n      <td>176******39</td>\n      <td>变更套餐服务</td>\n      <td>2022-09-19 11:12:53</td>\n      <td>ORDER_CUS_STATUS_UNPAID</td>\n    </tr>\n    <tr>\n      <th>141325</th>\n      <td>C22091900278</td>\n      <td>8246617696214253568</td>\n      <td>130******66</td>\n      <td>个体工商户注册</td>\n      <td>2022-09-19 11:14:11</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>141326</th>\n      <td>C22091900280</td>\n      <td>7693027011266490368</td>\n      <td>张先生</td>\n      <td>换刻印章</td>\n      <td>2022-09-19 11:14:51</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>141327</th>\n      <td>C22091900282</td>\n      <td>16164666</td>\n      <td>成都鑫胜达房产经纪有限公司</td>\n      <td>小规模记账续费</td>\n      <td>2022-09-19 11:14:55</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141328</th>\n      <td>C22091900283</td>\n      <td>7992674694086336512</td>\n      <td>姜杨</td>\n      <td>公积金贷</td>\n      <td>2022-09-19 11:15:10</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>141329</th>\n      <td>C22091900283</td>\n      <td>7992674694086336512</td>\n      <td>姜杨</td>\n      <td>经营贷</td>\n      <td>2022-09-19 11:15:10</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>141330</th>\n      <td>C22091900285</td>\n      <td>138694</td>\n      <td>成都水木年华文化传播有限公司</td>\n      <td>小规模记账续费</td>\n      <td>2022-09-19 11:16:44</td>\n      <td>ORDER_CUS_STATUS_PROGRESSING</td>\n    </tr>\n    <tr>\n      <th>141331</th>\n      <td>C22091900286</td>\n      <td>16776161</td>\n      <td>四川展毅达文化传播有限公司</td>\n      <td>小规模记账续费</td>\n      <td>2022-09-19 11:17:37</td>\n      <td>ORDER_CUS_STATUS_UNPAID</td>\n    </tr>\n    <tr>\n      <th>141332</th>\n      <td>C22091900286</td>\n      <td>16776161</td>\n      <td>四川展毅达文化传播有限公司</td>\n      <td>年报申报</td>\n      <td>2022-09-19 11:17:37</td>\n      <td>ORDER_CUS_STATUS_UNPAID</td>\n    </tr>\n    <tr>\n      <th>141333</th>\n      <td>C22091900286</td>\n      <td>16776161</td>\n      <td>四川展毅达文化传播有限公司</td>\n      <td>报表编制</td>\n      <td>2022-09-19 11:17:37</td>\n      <td>ORDER_CUS_STATUS_UNPAID</td>\n    </tr>\n    <tr>\n      <th>141334</th>\n      <td>C22091900288</td>\n      <td>8119151582353096704</td>\n      <td>徐志高</td>\n      <td>年报申报</td>\n      <td>2022-09-19 11:18:08</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141335</th>\n      <td>C22091900288</td>\n      <td>8119151582353096704</td>\n      <td>徐志高</td>\n      <td>报表编制</td>\n      <td>2022-09-19 11:18:08</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141336</th>\n      <td>C22091900288</td>\n      <td>8119151582353096704</td>\n      <td>徐志高</td>\n      <td>小规模记账续费</td>\n      <td>2022-09-19 11:18:08</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141337</th>\n      <td>C22091900289</td>\n      <td>8065398200285724672</td>\n      <td>北京华清御都商务服务有限公司</td>\n      <td>公司注销</td>\n      <td>2022-09-19 11:18:55</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141338</th>\n      <td>C22091900290</td>\n      <td>117339</td>\n      <td>四川宝利来投资管理有限公司</td>\n      <td>有限公司注册</td>\n      <td>2022-09-19 11:19:03</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141339</th>\n      <td>C22091900290</td>\n      <td>117339</td>\n      <td>四川宝利来投资管理有限公司</td>\n      <td>小规模记账新签</td>\n      <td>2022-09-19 11:19:03</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n    <tr>\n      <th>141340</th>\n      <td>C22091900290</td>\n      <td>117339</td>\n      <td>四川宝利来投资管理有限公司</td>\n      <td>报表编制</td>\n      <td>2022-09-19 11:19:03</td>\n      <td>ORDER_CUS_STATUS_UNSUBMITE</td>\n    </tr>\n  </tbody>\n</table>\n<p>141341 rows × 6 columns</p>\n</div>"
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "outputs": [
    {
     "data": {
      "text/plain": "       客户id       商品集合\n客户id                  \n111  0  111  {E, A, C}\n112  0  112        {B}\n113  0  113        {D}",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th></th>\n      <th>客户id</th>\n      <th>商品集合</th>\n    </tr>\n    <tr>\n      <th>客户id</th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>111</th>\n      <th>0</th>\n      <td>111</td>\n      <td>{E, A, C}</td>\n    </tr>\n    <tr>\n      <th>112</th>\n      <th>0</th>\n      <td>112</td>\n      <td>{B}</td>\n    </tr>\n    <tr>\n      <th>113</th>\n      <th>0</th>\n      <td>113</td>\n      <td>{D}</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "data = pd.DataFrame(\n",
    "    {\n",
    "        '客户id': ['111', '112', '111', '113', '111'],\n",
    "        '商品id': ['A', 'B', 'C', 'D', 'E'],\n",
    "        'date': ['2022-03-01', '2022-07-31', '2022-06-01', '2020-01-13', '2022-07-13']  # date为字符串类型\n",
    "    }\n",
    ")\n",
    "\n",
    "\n",
    "def get_custom(x):\n",
    "    user_id = x['客户id'].values[0]\n",
    "    last_date = None\n",
    "    collector = []\n",
    "    for each in x[['date', '商品id']].values:\n",
    "        date = each[0]\n",
    "        if last_date is None:\n",
    "            collector.append(each[1])\n",
    "\n",
    "        if last_date is not None:\n",
    "            if abs((datetime.strptime(last_date, '%Y-%m-%d') - datetime.strptime(date, '%Y-%m-%d')).days) < 100:\n",
    "                collector.append(each[1])\n",
    "            else:\n",
    "                return pd.DataFrame(\n",
    "                    [(user_id, set(collector))],\n",
    "                    columns=['客户id', '商品集合']\n",
    "                )\n",
    "\n",
    "        last_date = date\n",
    "\n",
    "    return pd.DataFrame(\n",
    "        [(user_id, set(collector))],\n",
    "        columns=['客户id', '商品集合']\n",
    "    )\n",
    "\n",
    "\n",
    "data = data.sort_values(by='date', ascending=False)\n",
    "user_record = data.groupby('客户id').apply(get_custom)\n",
    "user_record"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "outputs": [
    {
     "data": {
      "text/plain": "       客户id    商品集合\n客户id               \n111  0  111  {A, C}",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th></th>\n      <th>客户id</th>\n      <th>商品集合</th>\n    </tr>\n    <tr>\n      <th>客户id</th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>111</th>\n      <th>0</th>\n      <td>111</td>\n      <td>{A, C}</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_record[[len(each) > 1 for each in user_record['商品集合'].values]]"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "outputs": [
    {
     "data": {
      "text/plain": "92"
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "(datetime.strptime('2022-06-01', '%Y-%m-%d') - datetime.strptime('2022-03-01', '%Y-%m-%d')).days"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}